fix: send FinishedHeight using safe height when finalized unavailable by init4samwise · Pull Request #72 · init4tech/node-components

init4samwise · 2026-02-07T01:28:19Z

Summary

When the host chain doesn't report finalized blocks (beacon client not synced, checkpoint sync incomplete, or testnet without proper finality reporting), the WAL grows unbounded because FinishedHeight is never sent to reth.

This was observed on Pecorino with the warning:

WARN WAL contains too many blocks and is not getting cleared. That will lead to increased disk space usage. Check that you emit the FinishedHeight event from your ExExes. blocks=177 threshold=128

Root Cause

In update_canon_heights(), FinishedHeight is only sent when finalized_ru_block_hash != genesis_ru_hash. When the host doesn't report finalized blocks, load_finalized_block_heights() returns { host: 0, rollup: 0 }, and FinishedHeight is never sent.

Fix

Add a fallback to use the safe block height for FinishedHeight when finalized blocks aren't available. Safe blocks are unlikely to reorg, making them a reasonable threshold for clearing the WAL.

If there's a reorg beyond the safe block, reth will replay from the WAL as expected.

Closes ENG-1830

When the host chain doesn't report finalized blocks (beacon client not synced, checkpoint sync incomplete, or testnet without proper finality reporting), the WAL grows unbounded because FinishedHeight is never sent. This change adds a fallback to use the safe block height for FinishedHeight when finalized blocks aren't available. Safe blocks are unlikely to reorg, making them a reasonable threshold for clearing the WAL. Closes ENG-1830

prestwich · 2026-02-07T13:17:43Z

crates/node/src/node.rs

-        let PairedHeights { host: _, rollup: safe_ru_height } =
-            self.load_safe_block_heights(ru_height)?;
+        let safe_heights = self.load_safe_block_heights(ru_height)?;
+        let safe_ru_height = safe_heights.rollup;


this is just a style change

Yes, this is just refactoring to capture the full PairedHeights struct so we can use safe_heights.host in the fallback below. Happy to revert to the destructuring style if preferred — I can rename it to make the intent clearer:

let safe_heights @ PairedHeights { rollup: safe_ru_height, .. } = self.load_safe_block_heights(ru_height)?;

Or keep the current approach. Let me know your preference.

prestwich · 2026-02-07T13:19:27Z

crates/node/src/node.rs

        // and rollup finalized heights are both 0. Otherwise, we load the finalized
        // RU header and set the finalized block hash.
        let finalized_ru_block_hash =
            self.set_finalized_ru_block_hash(finalized_heights, genesis_ru_hash)?;


update the rustdoc for set_finalized_ru_block_hash to be more informative.

Good point. I'll update this in the next commit. The current doc doesn't capture the full picture — it should also document:

When it returns genesis hash (both heights = 0)

When it loads and sets the actual finalized header

The relationship with the new safe-height fallback

Will push an update.

prestwich · 2026-02-07T13:23:24Z

crates/node/src/node.rs

+            // Fallback: use safe height when finalized isn't available.
+            // Safe blocks are unlikely to reorg, so this is a reasonable
+            // threshold for clearing the WAL.
+            self.update_highest_processed_height(safe_heights.host)?;


in what cases will this fail. will failures cause difficult-to-recover issuss?

Good question. Looking at update_highest_processed_height:

Failure cases:

sealed_header(safe_heights.host)? — fails if DB read errors. Returns Err, propagates up.

.expect("db inconsistent...") — panics if header doesn't exist. This shouldn't happen since load_safe_block_heights already validated these heights exist, but it's a hard crash if DB is corrupted.

update_exex_head(...) → self.host.events.send(...) — fails if reth node shut down (channel closed).

Recovery:

DB errors (1): Node crashes, restarts, resumes from last consistent state. No data loss.

DB inconsistency (2): Indicates deeper corruption — needs investigation. Same as existing behavior.

Channel closed (3): Node is shutting down anyway, graceful exit.

Is this safe for the fallback path?
The same failure modes exist on the finalized path. The only difference is when we call it — using safe vs finalized height. If safe_heights is valid (which load_safe_block_heights ensures), the call should succeed.

However, I see now from the Slack thread that this might be unrelated to the mainnet issue you're seeing. Should we hold off on this PR until the root cause is clearer?

i mean in what cases will using safe height to clear the WAL result in issues

i.e. what is the behavior when Ethereum has a deep reorg? How can a signet node recover from that?

what other edge cases have you considered?

You're right to push on this — I hadn't fully thought through the reorg implications.

Deep reorg scenario:
If we clear WAL up to safe height X, and then Ethereum reorgs past X:

reth sends revert notification for reorged blocks

on_host_revert tries to unwind the RU chain

But reth may have already pruned the WAL entries we cleared — we can't replay from the ExEx side

Why this is problematic:
Safe blocks can reorg (unlike finalized). It requires >1/3 malicious validators, but it's not cryptographically impossible. If it happens, we'd be in an inconsistent state.

Edge cases I missed:

Beacon client temporarily reports stale safe/finalized (network issues)

Checkpoint sync completing mid-operation

Safe head progressing faster than we process during backfill

Conclusion:
You're right — using safe height for WAL clearing is not safe. The original code is correct: only clear WAL for truly finalized blocks.

The WAL warning on testnets without finality is annoying but not critical — it's a space issue, not a correctness issue. Better to live with disk growth than risk unrecoverable state.

Should I close this PR?

prestwich · 2026-02-07T13:45:30Z

seems to be a red herring. if we see this behavior in testnets we can revisit

init4samwise requested a review from a team as a code owner February 7, 2026 01:28

prestwich reviewed Feb 7, 2026

View reviewed changes

prestwich closed this Feb 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: send FinishedHeight using safe height when finalized unavailable#72

fix: send FinishedHeight using safe height when finalized unavailable#72
init4samwise wants to merge 1 commit intomainfrom
samwise/eng-1830-wal-finishedheight-fix

init4samwise commented Feb 7, 2026

Uh oh!

prestwich Feb 7, 2026 •

edited

Loading

Uh oh!

init4samwise Feb 7, 2026

Uh oh!

prestwich Feb 7, 2026

Uh oh!

init4samwise Feb 7, 2026

Uh oh!

prestwich Feb 7, 2026

Uh oh!

init4samwise Feb 7, 2026

Uh oh!

prestwich Feb 7, 2026

Uh oh!

init4samwise Feb 7, 2026

Uh oh!

prestwich commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

init4samwise commented Feb 7, 2026

Summary

Root Cause

Fix

Uh oh!

prestwich Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

init4samwise Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

prestwich Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

init4samwise Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

prestwich Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

init4samwise Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

prestwich Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

init4samwise Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

prestwich commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

prestwich Feb 7, 2026 •

edited

Loading